Kevin Mader
20 March 2014
ETHZ: 227-0966-00L
We have dramatically simplified our data, but there is still too much.
56GB / sample
\[ \downarrow \]
(1.75GB / sample)
\[ I_{id}(x,y) = \begin{cases} 1, & L(x,y) = id \\ 0, & \text{otherwise} \end{cases} \]
\[ \bar{x} = \frac{1}{N} \sum_{\vec{v}\in I_{id}} \vec{v}\cdot\vec{i} \] \[ \bar{y} = \frac{1}{N} \sum_{\vec{v}\in I_{id}} \vec{v}\cdot\vec{j} \] \[ \bar{z} = \frac{1}{N} \sum_{\vec{v}\in I_{id}} \vec{v}\cdot\vec{k} \]
If the gray values are kept (or other meaningful ones are used), this can be seen as a weighted center of volume or center of mass (using \( I_{gy} \) to distinguish it from the labels)
\[ \Sigma I_{gy} = \frac{1}{N} \sum_{\vec{v}\in I_{id}} I_{gy}(\vec{v}) \] \[ \bar{x} = \frac{1}{\Sigma I_{gy}} \sum_{\vec{v}\in I_{id}} (\vec{v}\cdot\vec{i}) I_{gy}(\vec{v}) \] \[ \bar{y} = \frac{1}{\Sigma I_{gy}} \sum_{\vec{v}\in I_{id}} (\vec{v}\cdot\vec{j}) I_{gy}(\vec{v}) \] \[ \bar{z} = \frac{1}{\Sigma I_{gy}} \sum_{\vec{v}\in I_{id}} (\vec{v}\cdot\vec{k}) I_{gy}(\vec{v}) \]
Exents or caliper lenghts are the size of the object in a given direction. Since the coordinates of our image our \( x \) and \( y \) the extents are calculated in these directions
Define extents as the minimum and maximum values along the projection of the shape in each direction \[ \text{Ext}_x = \left\{ \forall \vec{v}\in I_{id}: max(\vec{v}\cdot\vec{i})-min(\vec{v}\cdot\vec{i}) \right\} \] \[ \text{Ext}_y = \left\{ \forall \vec{v}\in I_{id}: max(\vec{v}\cdot\vec{j})-min(\vec{v}\cdot\vec{j}) \right\} \] \[ \text{Ext}_z = \left\{ \forall \vec{}\in I_{id}: max(\vec{v}\cdot\vec{k})-min(\vec{v}\cdot\vec{k}) \right\} \]
By definition (New Oxford American): varying in magnitude according to the direction of measurement.
Due to its very vague definition, it can be mathematically characterized in many different very much unequal ways (in all cases 0 represents a sphere)
\[ Aiso1 = \frac{\text{Longest Side}}{\text{Shortest Side}} - 1 \]
\[ Aiso2 = \frac{\text{Longest Side}-\text{Shortest Side}}{\text{Longest Side}} \]
\[ Aiso3 = \frac{\text{Longest Side}}{\text{Average Side Length}} - 1 \]
\[ Aiso4 = \frac{\text{Longest Side}-\text{Shortest Side}}{\text{Average Side Length}} \]
\[ \cdots \rightarrow \text{ ad nauseum} / \infty \]
Let's take some sample objects
| Y Extent | Aiso1 | Aiso2 | Aiso3 | Aiso4 |
|---|---|---|---|---|
| 0.00 | 4999.00 | 1.00 | 1.00 | 2.00 |
| 0.01 | 499.00 | 1.00 | 1.00 | 1.99 |
| 0.10 | 49.00 | 0.98 | 0.96 | 1.92 |
| 1.00 | 4.00 | 0.80 | 0.67 | 1.33 |
| 2.00 | 1.50 | 0.60 | 0.43 | 0.86 |
| 3.00 | 0.67 | 0.40 | 0.25 | 0.50 |
| 4.00 | 0.25 | 0.20 | 0.11 | 0.22 |
| 5.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Objects with uniformally distributed, independent \( x \) and \( y \) extents
While easy to calculate, the bounding box / extents approach is a very rough approximation for most of the objects in our image. In particular objects which are not parallel to the \( XY \)-axes are misrepresented.
While many of the topics covered in Linear Algebra and Statistics courses might not seem very applicable to real problems at first glance, at least a few of them come in handy for dealing distributions of pixels (they will only be briefly covered, for more detailed review look at some of the suggested material)
Similar to K-Means insofar as we start with a series of points in a vector space and want to condense the information. With PCA instead of searching for distinct groups, we try to find a linear combination of components which best explain the variance in the system.
As an example we will use a very simple example of corn and chicken prices vs time
The first principal component condenses the correlated information in both the chicken and corn prices (perhaps the underlying cost of fuel) since it explains the most variance in the final table of corn and chicken prices.
The second principal component is then related to the unique information seperating chicken from corn prices but neither indices directly themselves (maybe the cost of antibiotics)
Going back to a single cell, we have the a distribution of \( x \) and \( y \) values.
A principal component analysis of the voxel positions, will calculate two new principal components (the components themselves are the relationships between the input variables and the scores are the final values.)
We start off by calculating the covariance matrix from the list of \( x \), \( y \), and \( z \) points that make up our object of interest.
\[ COV(I_{id}) = \frac{1}{N} \sum_{\forall\vec{v}\in I_{id}} \begin{bmatrix} \vec{v}_x\vec{v}_x & \vec{v}_x\vec{v}_y & \vec{v}_x\vec{v}_z\\ \vec{v}_y\vec{v}_x & \vec{v}_y\vec{v}_y & \vec{v}_y\vec{v}_z\\ \vec{v}_z\vec{v}_x & \vec{v}_z\vec{v}_y & \vec{v}_z\vec{v}_z \end{bmatrix} \]
We then take the eigentransform of this array to obtain the eigenvectors (principal components, \( \vec{\Lambda}_{1\cdots 3} \)) and eigenvalues (scores, \( \lambda_{1\cdots 3} \))
\[ COV(I_{id}) \rightarrow \begin{bmatrix} \vec{\Lambda}_{1x} & \vec{\Lambda}_{1y} & \vec{\Lambda}_{1z} \\ \vec{\Lambda}_{2x} & \vec{\Lambda}_{2y} & \vec{\Lambda}_{2z} \\ \vec{\Lambda}_{3x} & \vec{\Lambda}_{3y} & \vec{\Lambda}_{3z} \end{bmatrix} * \begin{bmatrix} \lambda_1 \\ \lambda_2 \\ \lambda_3 \end{bmatrix} \] The principal components tell us about the orientation of the object and the scores tell us about the corresponding magnitude (or length) in that direction.
princomp or pca in various languages and scale well to very large datasets.While the eigenvalues and eigenvectors are in their own right useful
We see that there seems to be a general, albeit weak, correlation between the two measures. The most concerning portion is however the left side where the extents or bounding box method reports 0 anisotropy and the elliptical method reports substancial amounts of it.
The models we have done are all applicable to both 2D and 3D images. The primary difference is when looking at 3D images there is an extra dimension to consider.